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Abstract: I will show that there is a deep relation between error-correction codes and certain mathematical models 
of spin glasses. In particular minimum error probability decoding is equivalent to finding the ground state of the 
corresponding spin system. The most probable value of a symbol is related to the magnetization at a different 
i temperature. Convolutional codes correspond to one-dimensional spin systems and Viterbi's decoding algorithm to 

the transfer matrix algorithm of Statistical Mechanics. 

I will also show how the recently discovered (or rediscovered) capacity approaching codes (turbo codes and 
low density parity check codes) can be analysed using statistical mechanics. It is possible to show, using statistical 
mechanics, that these codes allow error-free communication for signal to noise ratio above a certain threshold. This 
' threshold depends on the particular code, and can be computed analytically in many cases. 
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It has been known [1 — 4] that error-correcting codes are mathematically equivalent to some 
' theoretical spin-glass models. As it is explained in Forney's paper in this volume, there have been 

^ , recently very interesting new developments in error-correcting codes. It is now possible to approach 

practically very close to Shannon's channel capacity. First came the discovery of turbo codes by 
Berrou and Glavieux[5] and later the rediscovery of low density parity check codes[6], first discovered 
by Gallager[7, 8], in his thesis, in 1962. Both turbo codes and low density parity check (LPDC) codes 
^ , are based on random structures. It turns out, as I will explain later, that it is possible to use their 

equivalence with spin glasses, to analyse them using the methods of statistical mechanics. 

Let me start by fixing the notations. Each information message consists of a sequence of K bits 
' u — {ui, • • • , uk}, Ui — or 1. The binary vector u is called the source- word. Encoding introduces 

redundancy into the message. One maps u — > a; by encoding, a —>■ x has to be a one to one map for 
the code to be meaningful. The binary vector x has N > K components. It is called a code-word. 
The ratio R = K/N which specifies the redundancy of the code, is called the rate of the code. One 
particularly important family of codes are the so-called linear codes. Linear codes are defined by 

X = Gu 

G is a binary (i.e; its elements are zero or one) {N x K) matrix and the multiplication is modulo 
two. G is called the generating matrix of the code. Obviously by construction all the components 
Xi of a code-word x are not independent. Of all the 2^ binary vectors only 2^ = 2^^, those 
corresponding to a vector u, are code-words. Codewords satisfy the linear constraints (called parity 
check constraints) Hx = (modulo two), where is a (KxN) binary matrix, called the parity check 
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matrix. The connection with spin variables is straightforward. Ui —> ai = (—1)"*, Xi ^ Ji = (—1)^*. 
It follows that Ui + Uj = GiUj and 

J, = (-l)S.-^«"^=Ci^...fe^a,,---a,, (1) 

The previous equation defines the "connectivity matrix " C in terms of the generating matrix of the 
code G. Similarly one can write the parity check constraints in the form: 

(_1)E, = 1 ^ M^^,,,,, J,, • • • Jfc, = 1 (2) 

This defines the "parity constraint matrix " M in terms of the parity check matrix H of the code. 

Codewords are sent through a noisy transmission channel and they get corrupted because of 
the channel noise. If a Ji = ±1 is sent, the output will be different, in general a real number J°"*. 
Let us call (5( J°"*| j)(iJ°"* the probability for the transmission channel's output to be between J°"* 
and J + dJ°^*, when the input was J. The channel "transition matrix" (5(J°"*| J) is supposed to be 
known. We will assume that the noise is independent for any pair of bits ( "memoryless channel" ) , 

Qir'^Vl = ]lQ{JrVi) (3) 

i 

Communication is a statistical inference problem. Knowing the noise probability i.e. q{J°"'' \Ji), the 
code (i.e. in the present case of linear codes knowing the generating matrix G or the parity check 
matrix H) and the channel output J""*, one has to infer the message that was sent. The quality of 
inference depends on the choice of the code. 

We will now show that there exists a close mathematical relationship between error-correcting 
codes and theoretical models of disordered systems. To every possible information message (source 
word) r we can assign a probability p«oMrcej-^| jowt^^ conditional on the channel output J"'**. Or, 
equivalently, to any code-word J we can assign a probability P'=°''^( J| J°"*). 

Because of Bayes theorem, the probability for any code-word symbol ("letter") Jj = ±1, 
p(Ji| J°"*), conditional on the channel output J,°"*, is given by 

lnp(Ji|Jr*) = cl + lng(J°"Vi) = c2 + hiJi (4) 



where cl and c2 are constants (non depending on Jj) and 

- 2^''g(Jf«*|-l) 

It follows that 

pco<ie( /| /o«t) ^ cH Jk.--- Jk, ; 1) exp (5^ hiJi) (6) 

I i 

where c is a normalising constant. The Kronecker (5's enforce the constraint that J obeys the parity 
check equations (Equ. (2) ), i.e. that it is a code- word. The S's can be replaced by a soft constraint. 



pcode(/|/o«t) = const exp [u ^ Mj^^...^^ Jk, ■ ■ ■ Jk, + ^ hJi ] (7) 

I i 

where u — > oo. We now define the corresponding spin Hamiltonian by: 

_Hcode^j) = lnP-''«(J|J''-*) =uY^Ml...^^Jk, • • • Jfc, +Y,hiJi (8) 

I i 
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This is a spin system with multispin interactions and an infinite ferromagnetic coupling and a 
random external magnetic field. 

Alternatively, one may proceed by solving the parity check constraints Jj = C'^^...fe.Cfei • • 'Cfe. 
(i.e. by expressing the code- words in terms of the source- words). 

psource^g^jout^ = COUSt. Cxp /i.Q, ...fe^ (Tfe, • ■ • afe. ) (9) 

i 

where the hiS are given as before. The logarithm of psource^^^jout^^ 

jjsource^^^ ^ _ lnP™(5=| J-*) = - ^ ■ • • CTfc, (10) 

i 

has obviously the form of a spin glass Hamiltonian. 

We have given two different statistical mechanics formulations of error correcting codes. One 
in terms of the souceword probability p*°"''ce and the other in terms of the code-word probability 
pcode Because of the one to one correspondence between code- words and source- words, the two 
formulations are equivalent. In practice however it may make a difference. It may be more convenient 
to work with p«°"''ce or P'^°'^^^ depending on the case. For the case of turbo codes (see later) it will 
be more convenient to define another probability, the "register word" probability. 

It follows that the most probable sequence ("word MAP decoding") is given by the ground 
state of this Hamiltonian (H""'^'^ or fj^o^rce^ depending on the case). Instead of considering the 
most probable instance, one may only be interested in the most probable value rf of the i'th "bit" 
ri[9, 10, 11] ("symbol MAP decoding"). Because = ±1, the probability Pi for = 1 is simply 
related to mj, the average of Ti, pi = (1 -|- m,)/2. 

rui = ^ ^ Tj exp -//■(-?) Z= ^ exp-iJ(f) rf = sign (m^) (11) 

{ti---tn} {ti---tn} 

In the previous equation rrii is obviously the thermal average at temperature T = 1. It is amusing 
to notice that T = 1 corresponds to Nishimori's temperature[12]. 

When all messages are equally probable and the transmission channel is memoryless and sym- 
metric, i.e. when (j(J°"*| J^) = q{—J°'"^\ — Ji), the error probability is the same for all input sequences. 
It is enough to compute it in the case where all input bits are equal to one, i.e. when the transmitted 
code- word is the all zero's code- word. In this case, the error probability per bit Pe is Pe = , 

where m^^^ = J2iL\ '^'f^ ^"^^ '^'f^ is the symbol sequence produced by the decoding procedure. 

This means that it is possible to compute the bit error probability, if one is able to compute 
the magnetization in the corresponding spin system. 

Let me give a simple example of an = 1/2 "convolutional" code. Prom the N source symbols 
(letters) UiS we construct the 2N code- word letters x\, x\, k = 1, - ■ ■ ,N. 

xj =Ui + Ui-i + Ui-2 , Xi =Ui + Ui-2 (12) 

It follows that 

Jk = ''■fcffc-lffc-2 , Jk = (^kCrk-2 (13) 

^il^^^k3 = '5fe,i,^4,^fc2 + l'^fe-*<=3+2 ' <^ifiifc3 = h,ik,Sk,ii,^+2 (14) 

The corresponding spin Hamiltonian is 

-H = — ^ jl'""*TkTk--LTk-2 + j'k°^^TkTk-2 (15) 
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Here I assumed a Gaussian noise. In that ease, Eqn. (5) reduces to = J^"*/w'^, where uP is 
the variance of the noise. This is a one dimensional spin glass Hamiltonian. In fact it is easy to 
see that convolutional codes correspond to one dimensional spin systems. Their ground state can 
be found using the T = transfer matrix algorithm. This corresponds to the Viterbi algorithm 
in coding theory. For symbol MAP (maximum a posteriori probability) decoding, one can use the 
T = 1 transfer matrix algorithm. This in turn is the BCJR algorithm in coding theory [13]. 

As it is explained in Forney's paper in this volume, the newly discovered (or rediscovered) 
capacity approaching codes are based on random constructions. Using the equivalence explained 
above it has been possible to analyse them using the methods of statistical mechanics. 

Gallager's low density parity check (fc, I) codes arc defined by choosing at random a sparse 
parity check matrix H as follows. H has columns (we consider the case of code-words of length 
N). Each column of H has k elements equal to one and all other elements equal to zero. Each 
row has I non zero elements. It follows that H has Nk/l rows and that the rate of the code is 
R = 1 — k/l. It follows from equation (8) that Gallager's k,l codes correspond to diluted spin 
models with Z-spin infinite strength ferromagnetic interactions in an external random field. It turns 
out that the belief propagation algorithm, used to decode LPDC codes, amounts to an iterative 
solution of the Thouless Anderson Palmer[14] (TAP) equations for spin glasses. A detailed analysis 
of these codes is presented in Urbanke's paper in this volume. Low density parity check codes 
have been analysed using Statistical Mechanics methods by Kabashima Kanter and Saad[15, 16] in 
the replica symmetric approximation. More recently Montanari[17] was able to establish the entire 
phase diagram of LDPC codes. For fc, Z — *■ oo with k/l fixed, he showed that k, I codes correspond 
to a random energy model which can be solved without replicas. There is a phase transition in 
this model, which occurs at a critical value of the noise ric- Uc separates a zero error phase from 
a high error phase, ric in this case equals the value provided by Shannon's channel capacity. For 
finite k and I he found an exact one step replica symmetry breaking solution. The location of the 
phase transition determines ric- In this way he computed also for finite values of k and I the critical 
value of the noise below which error free communication is possible. A different value of ric, n^c 
had already being computed by Richardson and Urbankc[18] (sec Urbanke's paper in this volume). 
Richardson and Urbanke compute n^^ by analysing the behaviour of the decoding algorithm, belief 
propagation in this case. Statistical mechanics provides a threshold ric which in principle is different 
from n^J' . ric is reached by the optimum (but unknown) decoder. 

Turbo Codes also have been analysed using statistical mechanics[19, 20]. Turbo Codes are based 
on recursive convolutional codes. An example of non recursive convolutional code was given in Equ. 
(12). The corresponding recursive code is given, most conveniently, in terms of the auxiliary bits 
bi, defined below. The bi's are stored in the encoder's memory registers, that's why I call b the the 
"register word" . 

xj = Ui, xf =bi + bi-2, bi = Ui + bi-i + 6,-2 (16) 
It follows that the source letters Ui are given in terms of the auxiliary "register letters" 6, 

Ui = bi + bi-i + bi-2 (17) 

All additions are modulo two. 

To construct a turbo code, one artificially considers a second source word v, by performing a 

permutation, chosen at random, on the original code- word u. So one considers Vi = up^i-^ where 
j = P(i) is a (random) permutation of the K indices i and a second "register word" Cj, Cj = 
Vi -I- Cj_i -I- Ci-2- Obviously 

Vi = Ci + Ci-i + Ct-2 = Uj = bj + bj-i +bj-2, j ^ P{i) (18) 

Equ. (18) can be viewed as a constraint on the two register words b and c. Finally in the present 
example, a rate R — 1/3 turbo code, one transmits the iK letter code- word x] = Ui, xf = bi + bi-2, 
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= Ci + Ci-2, i = 1, - ■ ■ , K. Let's call, as before, 

= a = 1,2, 3 

the channel inputs and J°"*'" the channel outputs. In the previous, for reasons of convenience, we 
formulated convolutional codes using the source-word probability p»o'""'=« and LDPC codes using 
the code- word probability P'^°'^'^. The statistical mechanics of turbo codes is most conveniently 
formulated in terms of the "register words" probability P'"''^(a, t| J°"*) conditional on the channel 
outputs J°'^^, where Tj = (—1)''' and ctj = (—1)^'- The logarithm of this probability provides the 
spin Hamiltonian 

~ 1^ 2-^ k TkTk-lTk-2 + J k TkTk-2 + Jk 0'fcCrfe_2 [i-^) 

k 

Because of Equ. (18), the two spin chains r and a obey the constraints 

aiai-iai-2 = TjTj-iTj-2, j = P{i) (20) 

(As previously, we have considered the case of a Gaussian noise of variance w^.) This is an un- 
usual spin Hamiltonian. Two short range one dimensional chains arc coupled through the infinite 
range, non local constraint, Equ. (20). This constraint is non local because neighboring i's are not 
mapped to neighboring j's under the random permutation. It turns out that this Hamiltonian can 
be solved by the replica method. One finds a phase transition at a critical value of the noise Ucrit- 
For noises less than ricrit, the magnetization equals one, i.e. it is possible to communicate error 
free. In this respect, turbo codes are similar to Gallager's LDPC codes. The statistical mechanical 
models however, arc completely different. Let me also mention that, under some reasonable assump- 
tions, the iterative decoding algorithm for turbo codes (turbodecoding algorithm), which I am not 
explaining here, can be viewed[20] as a time discretisation of the Kolmogorov, Petrovsky and Pis- 
counov cquation[21]. This KPP equation has traveling wave solutions. The velocity of the traveling 
wave, which is computable analytically, corresponds to the convergence rate of the turbodecoding 
algorithm. The agreement with numerical simulations is excellent. 

So the equivalence between linear codes and theoretical models of spin glasses is quite general 
and we have established the following dictionary of correspondence. 

Spin Hamiltonian 

Find a ground state 
Ground state magnetization 
magnetization at temperature T = 1 
One dimentional spin — glasses 
T = Transfer matrix algorithm 
T = I Transfer matrix algorithm 
Diluted p — spin ferromagnets in a random field 
Coupled spin chainsPC 
Phase transition point 
Iterative solution of TAP equations 

I would like to conclude by pointing out some open questions. 



Error — correcting code <s=^ 
Signal to noise ■4=^ 
Maximum likelihood Decoding 
Error probability per bit 
Sequence of most probable symbols <s=^ 
Convolutional Codes <s=^ 
Viterbi decoding <S=^ 
BCJR decoding <===> 
Gallager LDPC codes 

Turbo Codes <J=^ 
Zero error threshold <s=^ 
Belief propagation algorithm <s=^ 
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What is the order of the phase transition? This question is particularly relevant for turbo codes 
and has important implications for decoding. 

What are the finite size effects? This question is particularly relevant near the zero error noise 
threshold (i.e. near the phase transition). The answer will depend on the order of the transition. 

EIow does the decoding complexity behave as one approaches the zero error noise threshold? 
Is there a critical slowing down? As it was said before, the decoding algorithms both for LDPC 
codes and turbo codes are heuristic and there are not known results as one approaches the phase 
transition. 

Is there a glassy phase in decoding? In other terms, do the heuristic decoding algorithms reach 
the threshold of optimum decoding, computed by statistical mechanics, or is there a (lower) noise 
"dynamical" threshold where decoding stops reaching optimal performance? 

I hope that at least some of the above questions will be answered in the near future. 



6 



References 



Sourlas, N., Nature 339, 693 (1989) 

Sourlas, N., in Statistical Mechanics of Neural Networks, Lecture Notes in Physics 368, ed. L. 
Garrido, Springer Verlag (1990) 

Sourlas, N., Ecole Normale Superieure preprint (April 1993) 

Sourlas, N., in From Statistical Physics to Statistical Inference and Back, ed. P. Grassberger 
and J.-P. Nadal, Kluwer Academic (1994) p. 195. 

C. Berrou, A. Glavieux, and P.Tliitimajshima. Proc.1993 Int. Conf. Comm. 1064-1070 
MacKay, D. J. C. Neal, R. M. Elect. Lett. 33, 457 (1997). 
Gallager, R. G. IRE Trans. Inform. Theory , IT-8, 21 (1962). 

Gallager, R. G. Low-Density Parity-Check Codes , MIT Press, Cambridge MA (1963). 
Rujan, P., Phys. Rev. Lett. 70, 2968 (1993) 
Nishimori, H., J. Phys. Soc. Jpn. , 62, 2973 (1993) 
Sourlas, N., Europhys. Lett. 25, 169 (1994) 
Nishimori, H., Progr. Theor. Phys. 66, 1169 (1981) 

L. Bahl, J. Cocke, F. Jehnek, and J. Raviv. IEEE Trans. Inf. Theory IT-20(1974) 248-287 
Thouless, D. J. Anderson, P. W. Palmer, R. G. Phil. Mag. 35, 593 (1977) 
Kanter, I. and Saad, D. Phys. Rev. Lett. 83, 2660 (1999) 
Kabashima, Y. Murayama T. and Saad, D. Phys. Rev. Lett. 84, 1355 (2000) 
Montanari, A. cond-ma t/0104079| 



Richardson, T. J. Urbanke, R. L. IEEE Trans. Inform. Theory 47, 638 (2001). 
Montanari, A. Sourlas, N. Eur. Phys. J. B 18, 107 (2000) 
Montanari, A. Eur. Phys. J. B 18, 121 (2000) 

Kolmogorov, A. Petrovsky, I and Piscounov, N. Moscou Univ. Math. Bull. 1, 1 (1937). 



7 



